Grouping and Aggregation

Grouping

The grouping function in preparing data combines all records that have identical values in a particular field, or combination of fields, into a single record.

Grouping serves to reduce the size of the dataset being analyzed. It is therefore important that you are knowledgeable of your data and are able to create groups that will provide meaningful results. Once grouped, aggregate operations can be performed.

Aggregation

Aggregation refers to the mathematical operations where a single value is returned from a list of input values.

Data preparation aggregate operations include:

Average: sum of all of the list divided by the number of items in the list
Count: number of dataset rows
Count Distinct: number of distinct data values. A dataset with a column containing the string values: Group A, Group B and Null will have a distinct count = 3.
Maximum: greatest value in the set
Minimum: least value in the set
Standard Deviation: measure of the spread of data and thus the amount of variation from the mean value. This value is the square root of the variance of the samples.
Standard Deviation Population: measure of the spread of data and thus the amount of variation from the mean value. This value is the square root of the variance of the whole population.
Sum: the addition of the sequence of numbers within the set
Variance: measure of the variation within the data; this value is the unbiased variance of samples, calculated using the unbiased number of data records n-1.
Variance Population: measure of the variation within the data; this value is the variance of the whole population, calculated using the total number of data records = n.